Can Automatic Speech Recognition Learn More from Human Speech Perception?

نویسندگان

Sorin DUSAN

Lawrence R. RABINER

چکیده

Although a great deal of progress has been made during the last two decades in automatic speech recognition (ASR), the performance of these ASR systems, as measured by word recognition and concept understanding error rates, is still much worse than that achieved by humans, even for carefully read and articulated speech in quiet conditions. This performance gap (between machines and humans) increases even more in noisy conditions and for conversational speech. Steadily increasing computational speed and computer memory tend to impose fewer and fewer constraints on the types and the amount of recognition processing that can be brought to bear on a particular recognition task. In spite of the increased computation and memory, the state-of-the-art technology in automatic speech recognition appears to have reached a plateau in the past few years. New techniques and principles need to be invented or applied in order to substantially reduce the current performance gap in speech recognition between humans and machines. This paper presents some ideas intended to stimulate further research on applying knowledge and principles derived from studies of human speech perception to automatic speech recognition. Although the mechanisms of human speech perception (HSP) are not fully understood, some findings from neuroscience, physiology, cognitive science and psychology could potentially lead to new understanding and thereby stimulate the development of new techniques and architectures for automatic speech recognition that, eventually, will bridge and reduce the performance gap between machines and humans.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Can Automatic Speech Recognition Learn More from Human Speech Perception?

نویسندگان

چکیده

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Speech Emotion Recognition Using Scalogram Based Deep Structure

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

Improving the performance of MFCC for Persian robust speech recognition

عنوان ژورنال:

اشتراک گذاری